mlatoz

Upper Confidence Bound (UCB)

The Multi-Armed Bandit Problem - Summary


Upper Confidence Bound Algorithm

Step 1: At each round n, we consider two numbers for each ad i: * Ni(n) - the number of times the ad i was selected up to round n, * Ri(n) - the sum of rewards the ad i up to round n.

Step 2: From these two numbers we compute:

Step 3: We select the ad i that has the maximum UCB ri(n) + Δi(n).


Download Resources


«Previous Next»